Mechanical Inference Problems in Continuous Speech Understanding

نویسندگان

  • William A. Woods
  • John Makhoul
چکیده

This paper presents and discusses examples of mechanical i n fe rence problems which must be so lved in order to cons t ruc t e f f e c t i v e mechanical speech understanding systems. The examples are taken from incrementa l s imu la t i ons of a p ro to t ype speech understanding system which w i l l use s y n t a c t i c , semant ic , and pragmat ic i n f o r m a t i o n as w e l l as a c o u s t i c a l and phono log ica l i n f o r m a t i o n to mechanica l ly "unders tand" cont inuous speech u t t e rances . I n t r o d u c t i o n In experiments in spectrogram read ing [1] the performance obta ined by human exper ts f o r phonet ic segmentat ion and l a b e l i n g w i t h o u t conscious appeal to s y n t a c t i c , semant ic , o r vocabulary c o n s t r a i n t s was: approx imate ly 75% o f the segments c o r r e c t l y labe led (w i t h e i t h e r a complete or a p a r t i a l phonet ic s p e c i f i c a t i o n ) , 15% m is l abe led , and 10% segments missed. The f a c t t h a t human exper ts w i t h years o f exper ience i n l ook ing a t spectrograms and a d e t a i l e d understanding of the acous t i c c h a r a c t e r i s t i c s of speech sounds f i n d i t imposs ib le t o un ique ly decide which o f seve ra l poss ib l e phonemes are present in a g i ven segment o f speech s i g n a l , and the f a c t t h a t they make a s i g n i f i c a n t number of e r r o r s i n bo th segmenting the s i g n a l i n t o phonet ic u n i t s and in the l a b e l i n g o f these u n i t s , make i t u n l i k e l y t h a t any mechanical a c o u s t i c a l process ing component w i l l be ab le to segment and l a b e l cont inuous speech s i gna l s w i t h very h igh r e l i a b i l i t y us ing on ly acous t i c i n f o r m a t i o n . Moreover, i t i s l i k e l y t h a t t h i s indeterminacy in the acous t i c domain is a fundamental p rope r t y of human speech and not j u s t an inadequacy in the ana lyzer . However, in the same exper iments , when the spectrogram reader used s y n t a c t i c , semant ic , and vocabulary c o n s t r a i n t s to a t tempt to i d e n t i f y the words in the sentences (using a computer ized word r e t r i e v a l r o u t i n e which f a c i l i t a t e d the vocabulary searches) the success r a t e f o r word i d e n t i f i c a t i o n was 963. There i s hope t h e r e f o r e t h a t w i t h the proper use of s y n t a c t i c , semant ic , and vocabulary c o n s t r a i n t s one cou ld b u i l d a system to understand cont inuous speech at a comparable l e v e l even though the acous t i c segmenter and l a b e l e r operates w i t h a s i g n i f i c a n t e r r o r r a t e . O f cou rse , i n both the i n i t i a l segmentat ion and l a b e l i n g and in the subsequent a p p l i c a t i o n of s y n t a c t i c and semantic c o n s t r a i n t s , the a t ta inment w i t h a mechanical a l g o r i t h m of performance comparable to t h a t of a human is no sma l l t a s k . The BBN Speech P r o j e c t The speech p r o j e c t at B o l t Beranek and Newman [ 2 , 5 , 6 ] is endeavoring to cons t ruc t a computer system which approaches the performance of human spectrogram readers at dec ipher ing the meaning of cont inuous spoken sentences. The task o f t h i s system w i l l be to "unders tand" spoken sentences and take a p p r o p r i a t e a c t i o n s . Note t h a t t h i s task does not i nc l ude producing an accura te phonet ic t r a n s c r i p t i o n o f the i n p u t o r even necessa r i l y an accura te l i s t o f the successive words o f the i n p u t (a l though i t would be hard to imagine i t g e t t i n g the a p p r o p r i a t e a c t i o n i f i t d i d no t i n f a c t i d e n t i f y most o f the words ) . What we are emphasizing here is t h a t i n a s i t u a t i o n i n which the acous t i cs i s unable to r e s o l v e the d e c i s i o n between two phonemes or between two words at some p o i n t in the sentence, bu t the remain ing components are ab le to dec ide the meaning o f the sentence in any case ( e . g . the meaning is the same rega rd less of which phoneme or word is chosen) , then the sentence w i l l be deemed to have been c o r r e c t l y unders tood. I t i s t h i s d i f f e r e n c e between what is r equ i red f o r a c o r r e c t ou tpu t t h a t d i s t i n g u i s h e s what the members of the ARPA speech p r o j e c t [3] have been c a l l i n g "speech unders tand ing" from the more t r a d i t i o n a l "speech r e c o g n i t i o n " . By examining the t e l e t y p e p ro toco l s of the K l a t t and Stevens exper iment [ 1 ] , we were ab le to gather cons iderab le i n f o r m a t i o n about the problem s o l v i n g processes and s t r a t e g i e s which those researchers used to untangle the meanings of spectrograms. On the bas is of these p r o t o c o l s one can concep tua l l y decompose the speech understanding process i n t o a number o f components or r o u t i n e s corresponding to d i f f e r e n t types o f knowledge and i n fe rence techniques a p p l i e d . These components i nc luded (1) EXTRACT, the r o u t i n e which performs the phonet ic segmentat ion and l a b e l i n g of the acous t i c s i g n a l (both segmenting and l a b e l i n g are i n t i m a t e l y cross connected) , (2) LEXRET, a l e x i c a l r e t r i e v a l r o u t i n e which recovers p o s s i b l e words from the vocabulary on the bas is o f p a r t i a l phone t i c i n f o r m a t i o n ( t h i s component was machine implemented in the K l a t t and Stevens exper imen t ) , (3) MATCH, a r o u t i n e which compares a g i ven cand ida te word aga ins t the speech s i g n a l at a g iven p o i n t and determines the q u a l i t y o f the match ( t h i s component i s in tended to i nc l ude the use o f phono log i ca l and a c o u s t i c p h o n e t i c r u l e s f o r

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Expressive Probability Models for Speech Recognition and Understanding

The paper is a brief summary of an invited talk given at the ASRU 99 conference. The principal points are as follows: first, that the expressive power of the probabilitymodels available for use in speech recognition and understanding has expanded significantly; second, that using expressive models such as dynamic Bayesian networks can result in improved learning rates and recognition results; a...

متن کامل

Auditory processing skills in brainstem level of autistic children: A Review Study

Aims: Autism is a pervasive developmental disorder. Deficit in sensory functions is one of the characteristics of people with autism, and usually these people show abnormality in processing and correct interpretation of auditory information. Also people with Autism show problems in communicating with others. This review article deals with the accurate understanding of Auditory processing skills...

متن کامل

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Power Series -Aftertreatment Technique for Nonlinear Cubic Duffing and Double-Well Duffing Oscillators

Modeling of large amplitude of structures such as slender, flexible cantilever beam and fluid-structure resting on nonlinear elastic foundations or subjected to stretching effects often lead to strongly nonlinear models of Duffing equations which are not amendable to exact analytical methods. In this work, explicit analytical solutions to the large amplitude nonlinear oscillation systems of cub...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Artif. Intell.

دوره 5  شماره 

صفحات  -

تاریخ انتشار 1973